DOCOHENT RESOHE 



BD 096 997 



IR 001 178 



AOTHOP 
TITLE 

TNSTITOTTOM 
POB DKT«S 



EDP5 PRICE 
DESCSTPTORS 



DENTIFIEPS 



Carnon, Jases L. 

"SOI-- Where are Me? The Challenge of the Future." The 
Infotaation Disseaination Center viev. 
Georgia Oniv., Athens. 
Oct 74 

19p.; Pap*r presented at' the Annual Meeting of the 
American Society for Inforaation Science (37th, 
Atlanta, Georgia, October 197a) 

HF-$0.75 HC-$1.50 PLOS POSTAGE 

Bibliographic Coupling; ^Computer Oriented Prograas; - 
Data Bases; ♦Information Centers; Inforaation ' 
Disseaination; Information networks; ^Information 
Retrieval; Information Scientists;.. Library Reference 
Services; Library science; Man Machine Systeas; On 
Line Systems; *Search Strategies; Speeches; ase 
Studies 

SDI; Selective Dissemination Of Information 



ABSTRACT 

The historical and current status of inforaation 
disseaination centers and the probtea of user interface are reviewed. 
During the past decade, the problems of technical data processing 
have been conquered; information dissemination has evolved froa a 
loosely knit group of experimental centers to an organization of 
established centers, many operating aultiple data bases. Competitive 
data bases are becoming available in a number of subject fields, 
putting the centers in a better bargaining position with the data 
base producers. However, on-line retrieval, resource sharing, and 
networking must solve the common problea of user interface before 
anyone or any combination of these operating modes can be really 
effective. Interactions between the user with his question, the 
intermediary (the profile code processor), and the search systea with 
Its data base are critical to continued evolution of inforaaticn 
centers. The intermediaries will, for. some time, be the most 
effective bridge between the users and the computer-based retrieval 
services. The breakthrough needed for both on-line and batch 
retrieval systems is the understanding, modelling, and simulation of 
the man-machine interfaces which ar«^ now handled by the 
intermediaries. (WCM) 



ERIC 



**SDI — Where are We? The Challenge of the Future" 

The Information Dissemination Center View U4 o€f-«RTM6NTOfHtAttH 

E DUC AftOI* A W€4.f^ A»£ 

\ MATIO»iAL iMSriVMTCOf 

.J# L* Cannon ^ u.is oocum^m has e^EM Rtww? 

OUCCD LftAftiv AS P€CC<veO f»0» 

^ ATlNCtT POtNT^ u» vii WOP OPtlUtONS 

Tnf- iT%<^iir»4- i rin * national fNifiTute or 

With a topic as ibroad as this one, and with free 
license from our session chairman to explore within it, 
the problem is not what to discuss but ;rather what a<5pect 
^can be coveued in the 20 minutes alottod. I would like 
to address Uu* of thv future chnllengo for SDl from 

the point of viow of t lio inform»U ion dissemination center — 
the orqanizationa I entity which has evolved over the past * 
decade to handle the retrieval processing of the computer-- 
readable bibliographic data bases. More particularly, I / 
would like to address tlie problem which we, in our center, 
see as the next major research and developments hurdle to be 
bridged if. SDI services are to continue to develop in the 
future as they have in the past. After briefly reviewing 
the historical evolution of information dissemination centers 
in general and a survey of the current status-, I'll turn 

m 

attention to the problem which I'll refer .to as the User 
Interface, and, I hope, convince you that it is indeed of ^ 
greater maqnitude and complexity than has generally been 
recognized and that it will require concentrated attention ^ 
by rosearchors and pruct i t ioners in the Information Science t 
and allied fields if we arc to ever realize the blue-sky 
dreams of general and widespread access to and use of 
bibliographic retrieval services through some network utility. 



Before goinq any further, I want to define some terras 
the way I will be using them since they may differ with 
some of the other panelists. I'm not stire how SDI was defined 
in setting up this SIG, hut its u^e in the literature has 
varied. Most authors limit its scope to current awareness 
searches, but some give it a broader scope. I will be using 
SDI in its broadest possible context — that is, the selection 
of ..information for dissemination in response tola request. No 
time frame is implied in the words themselves, tind I choose to 
include such typos of retrieval as have been labeled 
current awareness, retrospective, demand, customized, special, 
mission-oriented, and so fortn. 

Other terms which require cleirif ication include "center", 
"intermediary", "user", ana "data base producer or vendor". 
The "center" is the organizational entity or group which 
processes one or more computer-readable bibliographic data 
bases for the purpose of distributing bibliographic citations 
in response to individual queries. Thus, centers may be for- 
profit, or not-for-profit; located in a library or a computer 
center, or may be set up as an independent organization, as 
part of a government agency, or as part of a data base 
producer's services. My point is that the term "center", 
will be U3ed in its broadest context and should not be equated 
to any particular type of renter or operating mode* Another 
term which was mentioned was "intermediary" — or "profiler"* 



By these terms^ which will ho used synonymous I y^r T ro^r to 
the human beihtf who intorcU^tj^ in uiyway with tho ixficr or hir. 
quostion and the search system^ inrludinq such component:; of 
the search system as the data bases. These intermediaries 
are known by many names, — ^•g-r information specialist, 
reference librarian, information analyst,, and profile analyst. 
Again, the broadest possible scope should be associated with 
my use of the general term "intermediary" even though specific 
functions may vary from centers and in all 

functions may be performed in any given center. A "user", in 
my frame of rc^ferrmce, is the {H>r.son with tho information 
need — the person who wants an answer to a question, A user 
. may interact directly with a search system on his own, but. 
more often he is one member ot the team — the other being an 
intermediary — who interacts with the system. The last term 
to be defined is "data base producer or vendor" — the organ- 
izational entity that creates the machine-readable bibliographic 
data base. Like centers, they may be for-profit or not-for- 
profit, located in a government agency or with a professional 
society, or there may be any of a number of other possibilities. 
If a given organization both produces and searcher, its own 
data base, then it is both a data base producer and a center. 

So much for definitions. Let me turn now to a brief 
history of the development of information dissemination centers 
as a means of providing perspective for where w^ are, where I 
think wo are going, and what it will take to get there, 

f 



History 

Information Dissemination Centers using machine 
readable data bases h^d thtMr bi^pnniny 

back in tho early 1"60h just little over c. decade aqo — 
with the establishment of the Modlara and RDC centers by tfic 
National Library of Medicine and NASA, respectively. They 
were mission-oriented and heavily subsidized by the federal 
government, and these two data bases were limited to processing 
by the agency-s{5onsored centers. In t'le not-for-profit . sector , 
Chemical Abstracts Service led the way ith publicly available 
data bases, first with Chemical Titles about 1962, and a few 
years later with CBAC and POST. In these early years, ust>r 
groups tended to build up around individual data bases — the 
Medlars centers got together to discuss common problems, as 
did the NASA centers and the CAS tape users. During those 
first few years, our user groups struggled with such problems 
^as debugging search programs (which were often supplied with- 
the data base) , arguing the pros and cons of various search 
techniques, teaching each other how to prepare profiles, and 
persuading users to do their searches by computer. J^etrieval 
systems, as a concept, did not exist at that time — we 
still spoke in terms of search programs. And the file 
structures reflected their unit record heritage — card image 
records, with fixed length fields, numerically encoded index 
terms, and print-oriented data representation. 

Several significant changes have come about during the 
past decade changes which not only reflect the rapid 
maturing of an infant industry (we've been diapered and burped 



publicly on a number of occasions) , but also reflect major 
changes in what centers do, tbo user communities they serve, 
and relationships between centers and data base producers. 
On the technical side, we've moved from the single processing 
shops of 1401s and 7094s to third generation computer hardware 
with its versatile operating systems, applications software, 
and nultiprocessing onvironmtnt with telecommunications access. 
The self-defining, directory-oriented, variable length file 
structures, such as defined by the ANSI standard for biblio- 
graphic information interchange on magnetic tape, are now 
state-of-the-art and are» being adopted by more and more data 
base producers as they convert their data processing operations 
to integrated computer-based production operations. Search 
programs have evolved to largt- and relatively sophisticated 
•retrieval systems, capable of handling multiple data bases 
with varying content and format, often with many of the 
processing operations under user or intermediary control 
(e.g., format, content, location, and media in which the 
search results are delivered). Computer programming, profile 
construction, and data base conversion are state-of-the-art 
and part of the routine operations of all but the youngest of 
information dissemination centers. The ASIDIC meetings, which 
now attract as many as 80 attendees from among 30 full members 
and 50 associate members, are now devoted to topics which 
reflect the interactions of centers with their environment. 
With data base producers, the hot topics are lease and li'ronsc 
provisons, royalty- payments, usage restrictions, and networking 



implications. With libraries, two areas of interaction are 
drawing attention: one ccncerning the interf«.oe with reference 
librarians and the incorporation of the intermediary functions 
into reference librarianship, and the other dealing with tho 
location and delivery of documents which are identified through 

1 the computer-based retrieval sorvicos. 

i 

\ In sununary, during the past dcca^'e we have largely 

m * 

iconquered the technical data processing problems; we have 
^volved from a loosely knit group of experimental centers 
serving small parochial user groups to an organization of 
established centers, many of whom op'^rate multijile data bases 
sp.d serve a nation-wide user community in a competitive environ- 
m^nt which provides shopping choices to those users. Competitive 
data bases are now bocominq -» Milablo in a number of subject 
fields, putting the centers in a better bargaining position 
with the data base producers and, indirectly at least, providing 
motivation for ^improved data base quality and serious consider- 
ation of unjustified incompatibilities between data bases. 

This brings tis to the present. What about the future? 

The hue and cry now is on-line ret**ieval, resource sharing, 
and networking. These throe concepts are by no means the same 
thing — on-line retrieval may be done via a telecommunications * 
utility but need not necessarily be part of a network, in the 
sense of having anything in common with other users of the 
utility. There are several centers which make their on-line 
retrieval services accessible via tho Tymshare communications 
sfystem yet have no relationships — in fact are compctiti\e — 



-7- 



with each other. Similarly, several centers may agree to 
share resources, thus constituting a network, without using 
telecofwnunicatlons . The NASA RDC centers, fbr example, 
comprise such a network of centers without telecommunications 
links. However, on-line retrieval, resource sharing, and 
networking do have one very important problem in common which 
must 'be solved before a- • one or any combination of these 
operating modes can be "oally effective, and lhat is the users* 
interface to the search system. 

The User Interface Problem 

r ^ 

I can practically hear the shrugs — "What's the big 
deal about user int'Tface? You t^repare some good profile 
coding manuals, run a traininq session, and tji^^oblem. is 
solved." And I might add that if wo had been told the samo thing 
a few years ago, we would probably have shrugged with the 
same answer. However, over seven years experience as a center, 
some 20 different data bases, and S)v^ 6 million document 
records in uhe retrospective collection have taught us 
differently. And I hope to convince you that understanding 
the interactions between the user with. his question, the 
intermediary (if one is imposed) , and the search system with 
its data bases is critical to continued evolution of infor- 
mat ion dissemination centars. It is the major block to 
effective use of on-line search services and to the sharing 
of data base resources, regardless of whether networking per 

so comes about. 



ERIC 



8- 

4. 



1 emphasize the word effective , because it is certainly 

true that on-line searching and profile exchange arc going 

on. But experience in our center raises serious concerns 

which we, as information science professionals^, should have 

about the quality of the results being obtained. (For 

those of you who may not know, the University of Georgia 

Computer Center operates a oer^ter wnich has remote input 

and output terminals located in New York, Ohio, and Atlanta, 

as well as several terminals on site In Athens.) 

Does this look familiar? It should, because this 

slide 1 diagram or a similar one appears in almost every profile 

FACE V. coding manual or textbook on reference librarianship. 

proposal 

Different names have been applied and the various sources may 
differ somewhat on the descriptions of the functions, but 
most of them present steps which are similar to those given 
in Figure 1. Descriptions normally concentrate on " what " 
is to be done with little or no attention on " how " . The 
librarian or p'ofiler is txhorl.od discuss or negotiate the 
user's question until it is clearly defined, but there is 
little guidance as to what constitutes a clear question or 
what techniques can be used to arrive at it. The same 
situation applies to other steps in the process, some more 

I 

a 

« than others, of course. Identify the concepts -- parenthe- 

tically, the "important" concepts — but what constitutes 
important concepts? The next step may be something like 
expand the concept, which moans to add the vocabulary appropriate 
*to the data bases — or what Lancaster calls "indexing the 



query". This profile^ coding proc^^s is often more art than 



ERIC 



science. In spite of the importance profile construction 
plays in the Effectiveness of the retrieval, we know 
virtually nothing about the decision-making processes and 
the sources and characteristics of the information used to 
make these decisions for creating good profil*es. 

Last year, the' diss;c=^ini nation cantors at UCLA and at 
Georgia launched a joint study to iavcstigate the functions, 
proL'csaos, and roles whifh t.ikf' pl.^ce in the inter fa«r 

l^otween user and systi-m what wc call our "interface" 

i 

I ^tudy: This joint study has two major phases, the first of 
which is to develop a model of the i*^terface process as it 
how exists. This has been called the Manual Model since most 
of the functions are performed manually by trained inter- 
mediaries. The factQthat ♦■here are two centers involved is 
important, because we are concerned not only about processes 
within a given center but also in differences which exist 
between cent<»s. Thus, the study has proceeded independently 

* in each center but in par/»llcl through the use of iointly 
defined meas4iring instruments so the data can be compared- 
The second phase of the study, which will follow development 

u 

of the Manual Model, is the creation of one (or perhaps more 
than one) model based on a networking environment (this has 
been dubbed the "Network Model"). It should be clearly under- 
stood that we are looking at networks involving multiple 
dissemination centers, rathor.than a single, central dissemina 
tion center servicing a distributed user population thronqli a 
communications utility, although the results may be applicable 
to both. 



-10- 



Over th'e past 10 months wq haVQ collected data on inany 
different characteristics of the interface process and from 
several points of view. Analysis of these data for develop- 
— - ment of the iittadeis is not yet coroplete, but the findings v 

already indicate that the interface process is far more 
slide 2 complex than we anticipated. As shown in slide 2, the major 

variables beinq invr.stigaLod are rolated to the- uaer, the 
(question, the data bases, the intermediary, and the .search ) 
^lide 3 system. Typical characteristics, of the user v3?iich arc being 

considered (slide 3) include the purpose for which the search 
is being done (e.g., a class project or torm paper, ,a 
dissertation, instruction or teaching, a reseatch project, 
a patent search, etcS , tamiliari^y with the topic being 
searched (e.g., is it a new project about which the user knows 
little or nothing, is it final wrap up on a journal article 
or dissertation to be sure nothing has been missed, or is it 
perhaps nrist for a review article or book?) , familiarity 
with literature resources in the field (e.g., can the user 
select the appropriate data bases?) , prior experience with 
computer-based search services (that is, a new user or one 
with prior experience?) , and others, as you see listed. For 
slide 4 the question, (slide 4) we are looking at such things as the 

« 

clarity with which it is expressed (i.o.^ how well^f ormulated 
is the question?)^ the con1|)leteness with which the initial 
.question is presented (information on thir; can bo obtained by 
comparing the user's initial question with the neqotiated 
question), and the scope of the qu<»stion (that is, is it a 



/ 



ERIC 



broad question intended or expected to retrieve a large 
number of answers or is it a narrow, precise question 
which can be answered with a single, relevant document?). 
l:'o the extent that the profile is a .surro"at«^ of the qu*>^tion, 
we ajtQ also interested in characteristics of the profiles 
and. their relationships to the initial quoistion. In the aiuu 
of data bases (slide 5)' we .-are i nvi :5t i gat i nq such characteristi 
as the size (in terms of both the number of record;^ p'^r some 
fixed unit of time, such as a year, and also the size of Lhu 
retrospective collection as a whole) . Two other factors 
believed to be very critical in ten... of the roles which inter- 
OTodiaries now play in preparing profiles are related to tlic 
vocabulary characteristics of the various data bases (that is, 
controlled versus uncontroll , classification versus indexing, 
and various combinations of these and other attributes) and 
also the data content of the data bases. When, for example, 
is it appropriate to search the abstract, and when is it better 
to stick with assigned index terms or codes? Should the search 
strategy, hence the profile, differ depending on whether or not 
the abst'-act is being searched? Tnoso of you who have acne a 
great deal of profile preparation will know that this is not a 
simple yes-no decision. It depends on how much you oxpoct to 
be retrieved, how good the index vocabulary is relative tho 
particular question at hand, how large tho d.ita base is and 
how much its coverage overlaps the subject matte - of the 
question, and so on. 1 won't go into characteristics- of r ho 
other major variables the search syftte.*m md its locjic md 



> . . \ \ 

* ■ \ 

\ — • 

-12- 

rejKTieval features, the background aind trainin^^ of the 
interfljejiiari'es, etc. — but I hope I have illustrated oven 
briefly how complex the process is when all the combinations 
and their associated intt^ractions arc considered. 'Several 
difforcnt data collection approaches havf been used in thi:; 
study — questionnaires filled out independently by the users 
and the intermediaries, and tape rocorded intorviown which 
have been transcribed and analyzed for the presence o?' absence 
of over 60 characteristics and have been described in terms of 

• 

event time series. Data has also been collected on the data • 
bases, one subtask of which is the creation of a merged 
vocabulary file of an estimated half-million terms or term- 
pairs for about 13 of the data bases used m our center. This 
master vocabulary file, whirh is designed around a thesaurus- 
like structure, forms the basis for study of the similarities 
and differences in indexing terminology between the various 
data bases. There has also been a detailed linguistic analysis 
of the transformations which occur in going from the narrative 
form of the user's question to the formalized profile 
representation as prepared for search against one or more of 
our data bases. Transformations which are data base dependent 
are of particular interest in this phase of the study. 

As I mentioned earlier, we have collected most of the 
information -needed for development of the Manual Model, but 
are still v/orking on the statistical analysis and interpretation 
of the d^ta. Based on our preliminary findings, I would have 
to say tnat we have only scratched the surface of the problem 



and will undoubtedly 'raise far more questions to be 

investigated further than we will be able to answer. As 

♦ % 

Saracevic has pointed out, "The hui' »n factor/, the variations 
introduced fay human decision -making, seems to be the over- 
whelming variable, the major ipfluencing factor affecting 
tho' performance of every and all components of an information 
'retrieval (IK) systom'* • Howovor^ T believe wo cannot simply 
rest on the matter by acknowledging its complexity. We musit 
devote at least as muclr attention and effort to this critical 
^i^rea of computer-based retrieval as has been poured into 

building the data bases in the firs^ place, comparing indexing 
techniques, and programming complex retrieval systems,, if for 
^- — «Q^other reason than to understand the functions and techniques 
of profile preparation in sufficient detail to effectively 
train our reference librarians and information ^p0€i)Lalists . 
These intermediaries will for some time be the i^Q^^L effective 
bridge between the users and the computer-based retrieval 
services offered by information dissemination centers like 
oursc.l ves . 

For my collegues who say that on-line is the only way 
to go I might respons that there is considerable evidence 
that both on-line and batch retrieval systems are presently 
being used in essentially the same mode. It is true that 
the on-line systems complete the search itself faster that do 
most batch-oriented shops in terms of elapsed time, but this 
is the only 



-14- 

■ . f 

significant difference at the present time between the two 

types. At; the ASIDIC meetinq a couple of weeks ago, one of 

the data base vendor representatives who uses his own datu 

■base in on-line mode reported an «vvorago of 40 minutes for 

coriKtriirt ion of thc^ profilo (off-line by a user- intormod Liry 

toam) , 18 minutes ot torminal r(>niu,*ct tinu? to t;ntcr and 

search the profile, and 30 minutcvs tci review t resaJts 

for rcicvaiico. These are almost identical timings iv:> those 

wo get in oar center wheie we use an on-line data entry, system 

for input to batch search. The on-line systems have certainly 

shortened the elapsed turn--around time for the seaft'h, but 

they have not changed the process significantly, and in fact 

those on'-line centers who started out trying to peddle, 

terminals directly\ to users hive rediscovered what we learned 

back in 1965 — tvX majority of the users don't have any 

aspirations toward toeing information specialists; they just 

want the results. At the present time, on-line search systems 

look like the early days of computer-assisted instruction — 

very expensive page turners with little or no advantage being 

taken of the interactive potentials of the computer. The 

breakthrough needed for both on-line and batch retrieval 

systems is in the understanding, modelling, and simulation 

of the man-machine interfaces which are now handled by those 

artists , the intermediaries. 



BEST COPY AVAItABLE 



Step 1 
Step 2 
Step 3 

4 

'''^^'"^ Step 'I 
Step 5 

Major Processing 




I 



Functions of the Reference Process 



m jQ R V ARI A BLES 

. USER 

. QUESTION 

. DATA BASES 

. INTERMEDIARY 

. SEARCH SYSTEM 



\ 



CHARACTER ISTICS OF THE QUESTION 

■ 

. CLARITY WITH WHICH IT IS EXPRESSED 

. COMPLETENESS WITIi WHICH IT IS 
PRESENTED ^ 

. SCOPE OF THE QUESTION 

. CHARACTERISTICS OF THE PROFILES AND 
THEIR RELATIONSHIPS TO THE INITIAL 
QUESTION 



\ 



USER CHA RACTER IS-T ICS 



. PURPOSE OF SEAkCH 

. FAMILIARITY WITH THE TOPIC 

. FAMILIARITY WITH LITERATURE 
RESOURCES 

. PRIOR EXPERIENCE WITH COMPUTER- 
BASED SEARCH SERVICES 




CHARACTERISTICS OF THE DATA BASES 
. SIZE 

. VOCABULARY CHARACTERISTICS 



. DATA CONTENT 

1 



